Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
نویسندگان
چکیده
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google’s Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.
منابع مشابه
Active Data Warehousing: A New Breed of Decision Support
Active data warehousing is rapidly changing the landscape for deployment of decision support solutions. The trend toward actionable business intelligence demands that capabilities for tactical and event-driven decision-making be supported in addition to traditional uses of the data warehouse for strategic decision-making. The resulting challenges to deliver extreme service levels in the areas o...
متن کاملStronger Semantics for Low-Latency Geo-Replicated Storage
We present the first scalable, geo-replicated storage system that guarantees low latency, offers a rich data model, and provides “stronger” semantics. Namely, all client requests are satisfied in the local datacenter in which they arise; the system efficiently supports useful data model abstractions such as column families and counter columns; and clients can access data in a causallyconsistent...
متن کاملTransactions with Consistency Choices on Geo-Replicated Cloud Storage
Pileus is a replicated and scalable key-value storage system that features geo-replicated transactions with varying degrees of consistency chosen by applications. Each transaction reads from a snapshot selected based on its requested consistency, from strong to eventual consistency or intermediate guarantees such as read-my-writes, monotonic, bounded, and causal.
متن کاملReal-time workflow audit data integration into data warehouse systems
Workflow management systems are being increasingly used by many organizations to automate business processes and decrease costs. Audit trails from workflow management systems include significant amounts of information that can be used to analyze and monitor the performance of business processes in order to improve the efficiency. Traditional approaches for using workflow audit trail for decisio...
متن کاملNear Real-time Data Warehousing with Multi-stage Trickle & Flip
A data warehouse typically is a collection of historical data designed for decision support, so it is updated from the sources periodically, mostly on a daily basis. Today’s business however asks for fresher data. Real-time warehousing is one of the trends to accomplish this, but there are a number of challenges to move towards true real-time. This paper proposes ‘Multi-stage Trickle & flip’ me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 7 شماره
صفحات -
تاریخ انتشار 2014